A Language for Nested Data Parallel Design-space Exploration on GPUs
نویسندگان
چکیده
Graphics Processing Units (GPUs) o er potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant kernel performance. To that end Obsidian includes guaranteed elimination of intermediate arrays and predictable space/time costs, while also providing array functions that are polymorphic across di erent levels of the GPUs' hierarchical structure, providing a limited form of nested data parallelism. We walk through case-studies that demonstrate how to use Obsidian for rapid design exploration or auto-tuning, resulting in better performance than hand-tuned kernels in an existing GPU language.
منابع مشابه
A language for hierarchical data parallel design-space exploration on GPUs
Graphics Processing Units (GPUs) offer potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant to kernel pe...
متن کاملDesign Space Exploration for GPU-Based Architecture
Recent advances in Graphics Processing Units (GPUs) provide opportunities to exploit GPUs for non-graphics applications. Scientific computation is inherently parallel, which is a good candidate to utilize the computing power of GPUs. This report investigates QR factorization, which is an important building block of scientific computation. We analyze different mapping mtheods of QR factorization...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملFunctional programming for nested data parallelism on GPUs
Recent advances in general purpose GPU computing technology allow new data parallel kernel jobs to be dispatched dynamically during kernel execution. This enables significantly more expressive programming using nested data parallelism (NDP), where the restrictive need for flat data structures and computation has been lifted. Functional programming is fundamentally well suited for expressing dat...
متن کاملDesign Flow for GPU and Multicore Execution of Dynamic Dataflow Programs
Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by ...
متن کامل